CENSREC-AV: evaluation frameworks for audio-visual speech recognition

نویسندگان

Satoshi Tamura

Chiyomi Miyajima

Norihide Kitaoka

Satoru Hayamizu

Kazuya Takeda

چکیده

This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have been proposed, a common evaluation framework, including audio-visual speech data and baseline system, is needed to estimate and compare these techniques and bimodal speech recognition schemes. Audio-visual evaluation frameworks, CENSREC-1-AV and CENSREC-2-AV, have been being built by the CENSREC project in Japan; CENSREC1-AV includes artificially noise-added waveforms and image sequences, whereas CENSREC-2-AV consists of audio-visual data recorded in in-car environments. A baseline method and its recognition results will be also provided with these corpora.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition

In this paper, an audio-visual speech corpus CENSREC-1-AV for noisy speech recognition is introduced. CENSREC-1-AV consists of an audio-visual database and a baseline system of bimodal speech recognition which uses audio and visual information. In the database, there are 3,234 and 1,963 utterances made by 42 and 51 speakers as a training and a test sets respectively. Each utterance consists of ...

متن کامل

Stream weight estimation using higher order statistics in multi-modal speech recognition

In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabiliti...

متن کامل

Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments: newest Part of the CENSREC Series -

Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now performance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy ...

متن کامل

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

In this paper, we investigate audio-visual interaction in sparse representation to obtain robust features for audio-visual speech recognition. Firstly, we introduce our system which uses sparse representation method for noise robust audio-visual speech recognition. Then, we introduce the dictionary matrix used in this paper, and consider the construction of audio-visual dictionary. Finally, we ...

متن کامل

CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments

In this paper, we newly introduce a collection of databases and evaluation tools called CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a handsfree speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition in various environments. The data con...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

CENSREC-AV: evaluation frameworks for audio-visual speech recognition

نویسندگان

چکیده

منابع مشابه

CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition

Stream weight estimation using higher order statistics in multi-modal speech recognition

Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments: newest Part of the CENSREC Series -

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments

عنوان ژورنال:

اشتراک گذاری